4. Identify Lines Using HITRAN Manually


Table of Contents

[1]:
# Import necessary modules
from Xpectra.SpecFitAnalyzer import SpecFitAnalyzer
from Xpectra.LineAssigner import *
from Xpectra.SpecStatVisualizer import plot_fitted_als_bokeh, plot_spectra_errorbar_bokeh

4.1 - Load the original and baseline-corrected spectra

\(\rightarrow\) In step 2, we corrected the spectral baseline and saved it as a CSV file in the processed_data directory. Here we load that data by converting to a DataFrame:

[2]:
# Call environment variable and assign path to data
__reference_data_path__ = os.getenv("Xpectra_reference_data")

# Import baseline corrected spectrum
corrected_spectrum = pd.read_csv(os.path.join(__reference_data_path__,'processed_data','arpls_baseline_corrected_methane_spectrum.csv'))

# Assign wavenumber (x) and signal (y) arrays
x = corrected_spectrum['original_x'].dropna().to_numpy()
y = corrected_spectrum['original_y'].dropna().to_numpy()

x_baseline_corr = corrected_spectrum['baseline_corrected_x'].dropna().to_numpy()
y_baseline_corr = corrected_spectrum['baseline_corrected_y'].dropna().to_numpy()

\(\rightarrow\) Visualize the imported spectra:

[3]:
# Obtain previously fitted baseline by reverse correcting the spectrum
spectral_baseline = y - y_baseline_corr

plot_fitted_als_bokeh(wavenumber_values = x,
                      signal_values = y,
                      fitted_baseline = spectral_baseline,
                      baseline_type = 'arpls'
                     )
Loading BokehJS ...

4.2 - Load HITRAN linelist and parse them

\(\rightarrow\) The next step is to upload the HITRAN line list to a DataFrame. For this, we use the LineAssigner module, instantiating it with the baseline-corrected spectrum and HITRAN file path.

[4]:
# Call environment variable and assign path to data
__reference_data_path__ = os.getenv("Xpectra_reference_data")

# Define path to HITRAN data
input_file = os.path.join(__reference_data_path__, 'datasets','CH4_nu3.par')

# Initialize LineAssigner
assign = LineAssigner(wavenumber_values = x_baseline_corr,
                      signal_values = y_baseline_corr,
                      hitran_file = input_file,
                      absorber_name= 'CH4')

\(\rightarrow\) With the class initialized, we now parse the line list to a DataFrame. The default columns converted to the DataFrame are: ‘local_iso_id’, ‘nu’, ‘sw’, ‘gamma_air’, ‘local_upper_quanta’, and ‘ierr’.

\(\rightarrow\) This function automatically seperates terms from local quanta into J quantum number, N quantum number, and symmetry.

[5]:
# Parse file to DataFrame
assign.parse_file_to_dataframe()
[5]:
molec_id local_iso_id nu sw a gamma_air gamma_self elower n_air delta_air ... iref line_mixing_flag gp gpp J_low sym_low N_low J_up sym_up N_up
0 6 2 2900.000621 1.825000e-25 0.023890 0.0490 0.067 814.6845 0.63 -0.005800 ... 64 3 3253433.0 None 12 A1 1 13 A2 9
1 6 2 2900.005693 6.307000e-27 0.005030 0.0470 0.065 1096.0334 0.62 -0.005800 ... 64 3 3253433.0 None 14 F2 3 14 F1 40
2 6 2 2900.022027 3.048000e-27 0.022620 0.0460 0.060 1593.6378 0.61 -0.005800 ... 64 3 3253433.0 None 17 F2 2 17 F1 47
3 6 1 2900.027223 1.891000e-25 0.000465 0.0480 0.067 815.1315 0.63 -0.005800 ... 34 3 3245363.0 None 12 F1 3 13 F2 21
4 6 2 2900.035027 1.905000e-25 0.067460 0.0400 0.067 815.0317 0.63 -0.005800 ... 64 3 3253433.0 None 12 E 2 12 E 25
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
41623 6 2 3299.877822 1.652000e-29 0.000353 0.0450 0.059 1780.0695 0.60 -0.006500 ... 54 3 3253433.0 None 18 F2 3 19 F1 85
41624 6 1 3299.900527 5.946000e-29 0.000004 0.0380 0.061 1416.5543 0.61 -0.006500 ... 34 3 3240363.0 None 16 E 1 17 E 52
41625 6 3 3299.901848 7.204000e-29 0.000221 0.0589 0.077 532.9581 0.75 -0.006346 ... 44 4 2243323.0 None 11 E 4 11 E 2
41626 6 1 3299.984795 2.838000e-25 0.035670 0.0470 0.099 1526.2146 0.75 -0.006600 ... 32 3 3333232.0 None 6 A2 1 6 A1 31
41627 6 2 3299.989099 5.343000e-29 0.000730 0.0380 0.060 1594.0043 0.61 -0.006500 ... 54 3 3253433.0 None 17 E 2 18 E 54

41628 rows × 25 columns

\(\rightarrow\) The HITRAN Dataframe is now accessible through class attribute hitran_df

[6]:
# Display header and first 3 rows
assign.hitran_df.head(3)
[6]:
molec_id local_iso_id nu sw a gamma_air gamma_self elower n_air delta_air ... iref line_mixing_flag gp gpp J_low sym_low N_low J_up sym_up N_up
0 6 2 2900.000621 1.825000e-25 0.02389 0.049 0.067 814.6845 0.63 -0.0058 ... 64 3 3253433.0 None 12 A1 1 13 A2 9
1 6 2 2900.005693 6.307000e-27 0.00503 0.047 0.065 1096.0334 0.62 -0.0058 ... 64 3 3253433.0 None 14 F2 3 14 F1 40
2 6 2 2900.022027 3.048000e-27 0.02262 0.046 0.060 1593.6378 0.61 -0.0058 ... 64 3 3253433.0 None 17 F2 2 17 F1 47

3 rows × 25 columns

4.3 - Identify Peaks Manually

\(\rightarrow\) We move on to identifying the location (in wavenumber) of each peak in our methane spectrum. To accomplish this, we use the LineAssigner module.

4.3.1 - Select wavelength range for analysis

\(\rightarrow\) Many times, we are only interested in a certain part of the spectrum, or the entire spectrum has too many peaks to process all at once. We select a range of wavenumbers for our analysis:

[7]:
wavenumber_range = (2911.15, 2911.9) # cm^-1

\(\rightarrow\) Lets visualize the spectrum within this wavenumber range:

[8]:
plot_spectra_errorbar_bokeh(wavenumber_values = x_baseline_corr,
                            signal_values = y_baseline_corr,
                            wavenumber_range = wavenumber_range,
                            absorber_name = 'CH4',
                            plot_type = 'line')
Loading BokehJS ...

4.3.2 - Find the peaks manually

\(\rightarrow\) Manually find spectral peaks, clicking on figure to print values

[9]:
assign.line_finder_manual(wavenumber_range=wavenumber_range)

\(\rightarrow\) Paste peak coordinates into list, and define peak centers

[10]:
guesses_list = [[2911.187, 0.504], [2911.262, 0.594], [2911.287, 0.403],
                [2911.350, 0.545], [2911.402, 0.450], [2911.518, 0.160],
                [2911.623, 0.549], [2911.676, 0.100], [2911.698, 0.195]]

initial_guesses = np.array(guesses_list)

peak_centers = initial_guesses[:,0]
peak_heights = initial_guesses[:,1]

\(\rightarrow\) Manually update class instance

[11]:
assign.peak_centers_manual = peak_centers

4.4 - Identify the lines

\(\rightarrow\) Compare peaks with known lines

\(\rightarrow\) Find the closest line from HITRAN line list for each peak in the lab spectrum

[12]:
# Filters HITRAN line list
filters = {'local_iso_id' : [1,2]} # Only search for common isotopologue


# Match found lines, plot them over spectrum, and display DataFrame
assign.hitran_line_assigner(threshold = 0.02,
                            filters = filters,
                            columns_to_print = ['local_iso_id', 'J_up','nu','peak_center'], # Print over each line
                            wavenumber_range = wavenumber_range,
                            __print__ = True, # Display the fitted HITRAN DataFrame
                            __plot_bokeh__ = True, # Plot interactively with Bokeh
                            __plot_seaborn__ = False
                           )
Loading BokehJS ...
molec_id local_iso_id nu sw a gamma_air gamma_self elower n_air delta_air ... line_mixing_flag gp gpp J_low sym_low N_low J_up sym_up N_up peak_center
0 6 1 2911.186061 5.284000e-23 0.057940 0.0576 0.070 575.2596 0.67 -0.007580 ... 3 4345363.0 None 10 F1 2 9 F2 35 2911.187
1 6 1 2911.261561 6.751000e-23 0.074010 0.0572 0.070 575.1841 0.67 -0.008480 ... 3 4345363.0 None 10 F1 1 9 F2 35 2911.262
2 6 1 2911.285780 3.903000e-23 0.042810 0.0576 0.070 575.2852 0.67 -0.007600 ... 3 4345363.0 None 10 F2 3 9 F1 36 2911.287
3 6 2 2911.348367 5.866000e-23 0.602700 0.0618 0.085 104.7777 0.75 -0.002122 ... 3 3335212.0 None 4 A1 1 5 A2 6 2911.350
4 6 1 2911.401080 4.331000e-23 0.047480 0.0573 0.070 575.1699 0.67 -0.008890 ... 3 4345363.0 None 10 F2 2 9 F1 36 2911.402
5 6 1 2911.518480 1.271000e-23 0.013930 0.0583 0.070 575.0525 0.67 -0.008430 ... 3 4345363.0 None 10 F2 1 9 F1 36 2911.518
6 6 1 2911.622555 5.719000e-23 0.037600 0.0587 0.070 575.0555 0.67 -0.008330 ... 3 4345363.0 None 10 A2 1 9 A1 11 2911.623
7 6 1 2911.674563 7.653000e-24 3.939000 0.0390 0.062 1817.8431 0.75 -0.005823 ... 3 3333232.0 None 9 F2 7 8 F1 75 2911.676
8 6 2 2911.697399 3.172000e-29 0.000225 0.0450 0.060 1594.1021 0.61 -0.005800 ... 3 3253433.0 None 17 F1 4 18 F2 27 2911.698

9 rows × 26 columns

4.5 - Save the results: Plots, dfs

\(\rightarrow\) Use plot saving functionality

[13]:
assign.hitran_line_assigner(threshold = 0.02,
                            filters = filters,
                            columns_to_print = ['nu','peak_center'],
                            wavenumber_range = wavenumber_range,
                            __save_plot__ = True, # Save the plot (seaborn version)
                           __reference_data__ = __reference_data_path__)
<Figure size 7000x4200 with 0 Axes>
[14]:
# Add peak_heights
assign.fitted_hitran['peak_heights'] = peak_heights

\(\rightarrow\) Save fitted HITRAN DataFrame to CSV file

[15]:
df = assign.fitted_hitran

# Define file name
file_name = "closest_hitran_lines_manual.csv"

# Save DataFrame to CSV
df.to_csv(os.path.join(__reference_data_path__,'processed_data',file_name), index=False)